[Core] Update cluster scheduler to handle label selector hard node id constraint#56235
Conversation
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
|
cc: @MengjinYan |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
| if (auto node_id_values = GetHardNodeAffinityValues(spec.GetLabelSelector())) { | ||
| for (const auto &node_id_hex : *node_id_values) { | ||
| if (auto addr = node_addr_factory_(NodeID::FromHex(node_id_hex))) { | ||
| return std::make_pair(addr.value(), false); |
There was a problem hiding this comment.
Something we can do smarter in the follow-up is that if we have a list of nodes here, we should pick the node with the most arguments.
There was a problem hiding this comment.
Actually, I'm wondering if the logic should be in here where the core_worker finds the raylet to send the requests to or it should be in the the raylet logic where it finds the best node to send the task?
There was a problem hiding this comment.
It's a good question. Currently it has to be here since raylet doesn't have location information of objects (owner does).
src/ray/raylet/scheduling/tests/cluster_resource_scheduler_test.cc
Outdated
Show resolved
Hide resolved
…t.cc Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com>
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Seems to be some compilation error. |
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
Looks like the |
|
Signed-off-by: Ryan O'Leary <ryanaoleary@google.com>
This should be passing now. |
|
The java test failure should be unrelated. cc: @jjyao |
… constraint (ray-project#56235) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
… constraint (ray-project#56235) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: zac <zac@anyscale.com>
… constraint (ray-project#56235) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: Marco Stephan <marco@magic.dev>
… constraint (#56235) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
… constraint (ray-project#56235) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
… constraint (ray-project#56235) Signed-off-by: Ryan O'Leary <ryanaoleary@google.com> Signed-off-by: Ryan O'Leary <113500783+ryanaoleary@users.noreply.github.com> Co-authored-by: Mengjin Yan <mengjinyan3@gmail.com> Co-authored-by: Jiajun Yao <jeromeyjj@gmail.com>
Why are these changes needed?
This PR updates the cluster scheduler to check for
ray.io/node-idlabel selectors when callingGetBestSchedulableNode. If a feasible node is found with the desired ID it's returned, and otherwise the resource demand is marked infeasible and nilNodeIDis returned. Then in theClusterLeaseManager, we are able to check for node ID label constraints and return an unschedulable error whenscheduling_node_id.IsNil().This behavior matches exactly how
NodeAffinitySchedulingPolicyhandles infeasible nodes whensoft=False. This change is necessary for #54940 which replaces usages ofNodeAffinitySchedulingPolicy, since otherwise the behavior of an unsatisfiableray.io/node-idlabel selector is to remain pending indefinitely.Related issue number
#51564
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.